Kirkland
Leveraging LLMs to Create Content Corpora for Niche Domains
Zhang, Franklin, Zhang, Sonya, Halevy, Alon
Constructing specialized content corpora from vast, unstructured web sources for domain-specific applications poses substantial data curation challenges. In this paper, we introduce a streamlined approach for generating high-quality, domain-specific corpora by efficiently acquiring, filtering, structuring, and cleaning web-based data. We showcase how Large Language Models (LLMs) can be leveraged to address complex data curation at scale, and propose a strategical framework incorporating LLM-enhanced techniques for structured content extraction and semantic deduplication. We validate our approach in the behavior education domain through its integration into 30 Day Me, a habit formation application. Our data pipeline, named 30DayGen, enabled the extraction and synthesis of 3,531 unique 30-day challenges from over 15K webpages. A user survey reports a satisfaction score of 4.3 out of 5, with 91% of respondents indicating willingness to use the curated content for their habit-formation goals.
- North America > Canada > Ontario > Toronto (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Germany > Berlin (0.04)
- (6 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report (0.82)
- Information Technology > Services (0.46)
- Health & Medicine (0.46)
What do professional software developers need to know to succeed in an age of Artificial Intelligence?
Kam, Matthew, Miller, Cody, Wang, Miaoxin, Tidwell, Abey, Lee, Irene A., Malyn-Smith, Joyce, Perez, Beatriz, Tiwari, Vikram, Kenitzer, Joshua, Macvean, Andrew, Barrar, Erin
Generative AI is showing early evidence of productivity gains for software developers, but concerns persist regarding workforce disruption and deskilling. We describe our research with 21 developers at the cutting edge of using AI, summarizing 12 of their work goals we uncovered, together with 75 associated tasks and the skills & knowledge for each, illustrating how developers use AI at work. From all of these, we distilled our findings in the form of 5 insights. We found that the skills & knowledge to be a successful AI-enhanced developer are organized into four domains (using Generative AI effectively, core software engineering, adjacent engineering, and adjacent non-engineering) deployed at critical junctures throughout a 6-step task workflow. In order to "future proof" developers for this age of AI, on-the-job learning initiatives and computer science degree programs will need to target both "soft" skills and the technical skills & knowledge in all four domains to reskill, upskill and safeguard against deskilling.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Norway > Central Norway > Trøndelag > Trondheim (0.05)
- Europe > Switzerland (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Education > Curriculum > Subject-Specific Education (0.46)
- (2 more...)
Vibe Coding Is Coming for Engineering Jobs
On a 5K screen in Kirkland, Washington, four terminals blur with activity as artificial intelligence generates thousands of lines of code. Steve Yegge, a veteran software engineer who previously worked at Google and AWS, sits back to watch. "This one is running some tests, that one is coming up with a plan. I am now coding on four different projects at once, although really I'm just burning tokens," Yegge says, referring to the cost of generating chunks of text with a large language model (LLM). Learning to code has long been seen as the ticket to a lucrative, secure career in tech.
Ultrasound Lung Aeration Map via Physics-Aware Neural Operators
Wang, Jiayun, Ostras, Oleksii, Sode, Masashi, Tolooshams, Bahareh, Li, Zongyi, Azizzadenesheli, Kamyar, Pinton, Gianmarco, Anandkumar, Anima
Lung ultrasound is a growing modality in clinics for diagnosing and monitoring acute and chronic lung diseases due to its low cost and accessibility. Lung ultrasound works by emitting diagnostic pulses, receiving pressure waves and converting them into radio frequency (RF) data, which are then processed into B-mode images with beamformers for radiologists to interpret. However, unlike conventional ultrasound for soft tissue anatomical imaging, lung ultrasound interpretation is complicated by complex reverberations from the pleural interface caused by the inability of ultrasound to penetrate air. The indirect B-mode images make interpretation highly dependent on reader expertise, requiring years of training, which limits its widespread use despite its potential for high accuracy in skilled hands. To address these challenges and democratize ultrasound lung imaging as a reliable diagnostic tool, we propose LUNA, an AI model that directly reconstructs lung aeration maps from RF data, bypassing the need for traditional beamformers and indirect interpretation of B-mode images. LUNA uses a Fourier neural operator, which processes RF data efficiently in Fourier space, enabling accurate reconstruction of lung aeration maps. LUNA offers a quantitative, reader-independent alternative to traditional semi-quantitative lung ultrasound scoring methods. The development of LUNA involves synthetic and real data: We simulate synthetic data with an experimentally validated approach and scan ex vivo swine lungs as real data. Trained on abundant simulated data and fine-tuned with a small amount of real-world data, LUNA achieves robust performance, demonstrated by an aeration estimation error of 9% in ex-vivo lung scans. We demonstrate the potential of reconstructing lung aeration maps from RF data, providing a foundation for improving lung ultrasound reproducibility and diagnostic utility.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > North Carolina (0.04)
- North America > United States > Washington > King County > Kirkland (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
MATATA: A weakly-supervised MAthematical Tool-Assisted reasoning for Tabular Applications
Vinayagame, Vishnou, Senay, Gregory, Martí, Luis
Mathematical reasoning capabilities are increasing with tool-augmented language agents, but methods often rely either on closed-source or large models, external data, or extensive prompt engineering. This work introduces MATATA, a novel cost-effective method to train LLM agents for tabular data problems through reasoning, planning, and tool use. With a progressive self-improvement paradigm and an iterative weak supervision, it empowers 3.8B/8B Small Language Models (SLMs), particularly suited for local hosting and sensitive business contexts where data privacy is crucial. By employing a flexible and reusable tools across different datasets, it achieves robust performance with effective scalability across shared tasks. Experiments show that MATATA reaches state-of-the-art performances on FinQA and TAT-QA among reasoning frameworks based on open-source models. Moreover, MATATA models compete with GPT-4 based frameworks on TabMWP, while being SLMs.
- North America > United States > Washington > King County > Kirkland (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
Switchable deep beamformer for high-quality and real-time passive acoustic mapping
Zeng, Yi, Li, Jinwei, Zhu, Hui, Lu, Shukuan, Li, Jianfeng, Cai, Xiran
Passive acoustic mapping (PAM) is a promising tool for monitoring acoustic cavitation activities in the applications of ultrasound therapy. Data-adaptive beamformers for PAM have better image quality compared to the time exposure acoustics (TEA) algorithms. However, the computational cost of data-adaptive beamformers is considerably expensive. In this work, we develop a deep beamformer based on a generative adversarial network, which can switch between different transducer arrays and reconstruct high-quality PAM images directly from radio frequency ultrasound signals with low computational cost. The deep beamformer was trained on the dataset consisting of simulated and experimental cavitation signals of single and multiple microbubble clouds measured by different (linear and phased) arrays covering 1-15 MHz. We compared the performance of the deep beamformer to TEA and three different data-adaptive beamformers using the simulated and experimental test dataset. Compared with TEA, the deep beamformer reduced the energy spread area by 18.9%-65.0% and improved the image signal-to-noise ratio by 9.3-22.9 dB in average for the different arrays in our data. Compared to the data-adaptive beamformers, the deep beamformer reduced the computational cost by three orders of magnitude achieving 10.5 ms image reconstruction speed in our data, while the image quality was as good as that of the data-adaptive beamformers. These results demonstrated the potential of the deep beamformer for high-resolution monitoring of microbubble cavitation activities for ultrasound therapy.
- Asia > China > Shanghai > Shanghai (0.05)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > Washington > King County > Kirkland (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes
Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng
In sequential decision making, it is often important and useful for end users to understand the underlying patterns or causes that lead to the corresponding decisions. However, typical deep reinforcement learning algorithms seldom provide such information due to their black-box nature. In this paper, we present a probabilistic model, Q-LDA, to uncover latent patterns in text-based sequential decision processes. The model can be understood as a variant of latent topic models that are tailored to maximize total rewards; we further draw an interesting connection between an approximate maximum-likelihood estimation of Q-LDA and the celebrated Q-learning algorithm. We demonstrate in the text-game domain that our proposed method not only provides a viable mechanism to uncover latent patterns in decision processes, but also obtains state-of-the-art rewards in these games.
- Asia > Middle East > Jordan (0.05)
- North America > United States > Washington > King County > Redmond (0.04)
- North America > United States > Washington > King County > Kirkland (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
FairHome: A Fair Housing and Fair Lending Dataset
Bagalkotkar, Anusha, Karmakar, Aveek, Arnson, Gabriel, Linda, Ondrej
We present a Fair Housing and Fair Lending dataset (FairHome): A dataset with around 75,000 examples across 9 protected categories. To the best of our knowledge, FairHome is the first publicly available dataset labeled with binary labels for compliance risk in the housing domain. We demonstrate the usefulness and effectiveness of such a dataset by training a classifier and using it to detect potential violations when using a large language model (LLM) in the context of real-estate transactions. We benchmark the trained classifier against state-of-the-art LLMs including GPT-3.5, GPT-4, LLaMA-3, and Mistral Large in both zero-shot and fewshot contexts. Our classifier outperformed with an F1-score of 0.91, underscoring the effectiveness of our dataset. WARNING: Some of the examples included in the paper are not polite, in so far as they reveal bias that might feel discriminatory to the readers.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Washington > King County > Kirkland (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
- Law (1.00)
- Banking & Finance > Real Estate (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
Denoising Plane Wave Ultrasound Images Using Diffusion Probabilistic Models
Asgariandehkordi, Hojat, Goudarzi, Sobhan, Sharifzadeh, Mostafa, Basarab, Adrian, Rivaz, Hassan
Ultrasound plane wave imaging is a cutting-edge technique that enables high frame-rate imaging. However, one challenge associated with high frame-rate ultrasound imaging is the high noise associated with them, hindering their wider adoption. Therefore, the development of a denoising method becomes imperative to augment the quality of plane wave images. Drawing inspiration from Denoising Diffusion Probabilistic Models (DDPMs), our proposed solution aims to enhance plane wave image quality. Specifically, the method considers the distinction between low-angle and high-angle compounding plane waves as noise and effectively eliminates it by adapting a DDPM to beamformed radiofrequency (RF) data. The method underwent training using only 400 simulated images. In addition, our approach employs natural image segmentation masks as intensity maps for the generated images, resulting in accurate denoising for various anatomy shapes. The proposed method was assessed across simulation, phantom, and in vivo images. The results of the evaluations indicate that our approach not only enhances image quality on simulated data but also demonstrates effectiveness on phantom and in vivo data in terms of image quality. Comparative analysis with other methods underscores the superiority of our proposed method across various evaluation metrics. The source code and trained model will be released along with the dataset at: http://code.sonography.ai
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- North America > United States > Washington > King County > Kirkland (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (6 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (0.95)
- Health & Medicine > Therapeutic Area (0.68)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
Optimization of array encoding for ultrasound imaging
Spainhour, Jacob, Smart, Korben, Becker, Stephen, Bottenus, Nick
Objective: The transmit encoding model for synthetic aperture imaging is a robust and flexible framework for understanding the effects of acoustic transmission on ultrasound image reconstruction. Our objective is to use machine learning (ML) to construct scanning sequences, parameterized by time delays and apodization weights, that produce high-quality B-mode images. Approach: We use a custom ML model in PyTorch with simulated RF data from Field II to probe the space of possible encoding sequences for those that minimize a loss function that describes image quality. This approach is made computationally feasible by a novel formulation of the derivative for delay-and-sum beamforming. Main Results: When trained for a specified experimental setting (imaging domain, hardware restrictions, etc.), our ML model produces optimized encoding sequences that, when deployed in the REFoCUS imaging framework, improve a number of standard quality metrics over conventional sequences including resolution, field of view, and contrast. We demonstrate these results experimentally on both wire targets and a tissue-mimicking phantom. Significance: This work demonstrates that the set of commonly used encoding schemes represent only a narrow subset of those available. Additionally, it demonstrates the value for ML tasks in synthetic transmit aperture imaging to consider the beamformer within the model, instead of purely as a post-processing step.
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > United States > Washington > King County > Kirkland (0.04)
- North America > United States > Virginia > Norfolk City County > Norfolk (0.04)
- (2 more...)